.NET setup and example for SynapseML
Installation
1. Install .NET
To start building .NET apps, you need to download and install the .NET SDK (Software Development Kit).
Download and install the .NET Core SDK. Installing the SDK adds the dotnet toolchain to your PATH.
Once you've installed the .NET Core SDK, open a new command prompt or terminal. Then run dotnet
.
If the command runs and prints information about how to use dotnet, you can move to the next step.
If you receive a 'dotnet' is not recognized as an internal or external command
error, make sure
you opened a new command prompt or terminal before running the command.
2. Install Java
Install Java 8.1 for Windows and macOS, or OpenJDK 8 for Ubuntu.
Select the appropriate version for your operating system. For example, select jdk-8u201-windows-x64.exe for a Windows x64 machine or jdk-8u231-macosx-x64.dmg for macOS. Then, use the command java to verify the installation.
3. Install Apache Spark
Download and install Apache Spark with version >= 3.2.0. (SynapseML v1.0.2 only supports spark version >= 3.2.0)
Extract downloaded zipped files (with 7-Zip app on Windows or tar
on linux) and remember the location of
extracted files, we take ~/bin/spark-3.2.0-bin-hadoop3.2/
as an example here.
Run the following commands to set the environment variables used to locate Apache Spark. On Windows, make sure to run the command prompt in administrator mode.
- Windows
- Mac/Linux
setx /M HADOOP_HOME C:\bin\spark-3.2.0-bin-hadoop3.2\
setx /M SPARK_HOME C:\bin\spark-3.2.0-bin-hadoop3.2\
setx /M PATH "%PATH%;%HADOOP_HOME%;%SPARK_HOME%bin" # Warning: Don't run this if your path is already long as it will truncate your path to 1024 characters and potentially remove entries!
export SPARK_HOME=~/bin/spark-3.2.0-bin-hadoop3.2/
export PATH="$SPARK_HOME/bin:$PATH"
source ~/.bashrc
Once you've installed everything and set your environment variables, open a new command prompt or terminal and run the following command:
spark-submit --version
If the command runs and prints version information, you can move to the next step.
If you receive a 'spark-submit' is not recognized as an internal or external command
error, make sure you opened a new command prompt.
4. Install .NET for Apache Spark
Download the Microsoft.Spark.Worker v2.1.1 release from the .NET for Apache Spark GitHub. For example if you're on a Windows machine and plan to use .NET Core, download the Windows x64 netcoreapp3.1 release.
Extract Microsoft.Spark.Worker and remember the location.
5. Install WinUtils (Windows Only)
.NET for Apache Spark requires WinUtils to be installed alongside Apache Spark. Download winutils.exe. Then, copy WinUtils into C:\bin\spark-3.2.0-bin-hadoop3.2\bin.
If you're using a different version of Hadoop, select the version of WinUtils that's compatible with your version of Hadoop. You can see the Hadoop version at the end of your Spark install folder name.
6. Set DOTNET_WORKER_DIR and check dependencies
Run one of the following commands to set the DOTNET_WORKER_DIR environment variable, which is used by .NET apps to locate .NET for Apache Spark worker binaries. Make sure to replace <PATH-DOTNET_WORKER_DIR> with the directory where you downloaded and extracted the Microsoft.Spark.Worker. On Windows, make sure to run the command prompt in administrator mode.
- Windows
- Mac/Linux
setx /M DOTNET_WORKER_DIR <PATH-DOTNET-WORKER-DIR>
export DOTNET_WORKER_DIR=<PATH-DOTNET-WORKER-DIR>
Finally, double-check that you can run dotnet, java, spark-shell
from your command line before you move to the next section.
Write a .NET for SynapseML App
1. Create a console app
In your command prompt or terminal, run the following commands to create a new console application:
dotnet new console -o SynapseMLApp
cd SynapseMLApp
The dotnet
command creates a new application of type console for you. The -o parameter creates a directory
named SynapseMLApp
where your app is stored and populates it with the required files.
The cd SynapseMLApp
command changes the directory to the app directory you created.
2. Install NuGet package
To use .NET for Apache Spark in an app, install the Microsoft.Spark package. In your command prompt or terminal, run the following command:
dotnet add package Microsoft.Spark --version 2.1.1
This tutorial uses Microsoft.Spark version 2.1.1 as SynapseML 1.0.2 depends on it. Change to corresponding version if necessary.
To use SynapseML features in the app, install SynapseML.X package. In this tutorial, we use SynapseML.Cognitive as an example. In your command prompt or terminal, run the following command:
# Update Nuget Config to include SynapseML Feed
dotnet nuget add source https://mmlspark.blob.core.windows.net/synapsemlnuget/index.json -n SynapseMLFeed
dotnet add package SynapseML.Cognitive --version 1.0.2
The dotnet nuget add
command adds SynapseML's resolver to the source, so that our package can be found.
3. Write your app
Open Program.cs in Visual Studio Code, or any text editor. Replace its contents with this code:
using System;
using System.Collections.Generic;
using Synapse.ML.Cognitive;
using Microsoft.Spark.Sql;
using Microsoft.Spark.Sql.Types;
namespace SynapseMLApp
{
class Program
{ static void Main(string[] args)
{
// Create Spark session
SparkSession spark =
SparkSession
.Builder()
.AppName("TextSentimentExample")
.GetOrCreate();
// Create DataFrame
DataFrame df = spark.CreateDataFrame(
new List<GenericRow>
{
new GenericRow(new object[] {"I am so happy today, its sunny!", "en-US"}),
new GenericRow(new object[] {"I am frustrated by this rush hour traffic", "en-US"}),
new GenericRow(new object[] {"The Azure AI services on spark aint bad", "en-US"})
},
new StructType(new List<StructField>
{
new StructField("text", new StringType()),
new StructField("language", new StringType())
})
);
// Create TextSentiment
var model = new TextSentiment()
.SetSubscriptionKey("YOUR_SUBSCRIPTION_KEY")
.SetLocation("eastus")
.SetTextCol("text")
.SetOutputCol("sentiment")
.SetErrorCol("error")
.SetLanguageCol("language");
// Transform
var outputDF = model.Transform(df);
// Display results
outputDF.Show();
// Stop Spark session
spark.Stop();
}
}
}
SparkSession is the entrypoint of Apache Spark applications, which manages the context and information of your application. A DataFrame is a way of organizing data into a set of named columns.
Create a TextSentiment instance, set corresponding subscription key and other configurations. Then, apply transformation to the dataframe, which analyzes the sentiment based on each row, and stores result into output column.
The result of the transformation is stored in another DataFrame. At this point, no operations have taken place because .NET for Apache Spark lazily evaluates the data. The operation defined by the call to model.Transform doesn't execute until the Show method is called to display the contents of the transformed DataFrame to the console. Once you no longer need the Spark session, use the Stop method to stop your session.
4. Run your .NET App
Run the following command to build your application:
dotnet build
Navigate to your build output directory. For example, in Windows you could run cd bin\Debug\net5.0
.
Use the spark-submit command to submit your application to run on Apache Spark.
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --packages com.microsoft.azure:synapseml_2.12:1.0.2 --master local microsoft-spark-3-2_2.12-2.1.1.jar dotnet SynapseMLApp.dll
--packages com.microsoft.azure:synapseml_2.12:1.0.2
specifies the dependency on synapseml_2.12 version 1.0.2;
microsoft-spark-3-2_2.12-2.1.1.jar
specifies Microsoft.Spark version 2.1.1 and Spark version 3.2
This command assumes you have downloaded Apache Spark and added it to your PATH environment variable so that you can use spark-submit. Otherwise, you'd have to use the full path (for example, C:\bin\apache-spark\bin\spark-submit or ~/spark/bin/spark-submit).
When your app runs, the sentiment analysis result is written to the console.
+-----------------------------------------+--------+-----+--------------------------------------------------+
| text|language|error| sentiment|
+-----------------------------------------+--------+-----+--------------------------------------------------+
| I am so happy today, its sunny!| en-US| null|[{positive, null, {0.99, 0.0, 0.0}, [{I am so h...|
|I am frustrated by this rush hour traffic| en-US| null|[{negative, null, {0.0, 0.0, 0.99}, [{I am frus...|
| The Azure AI services on spark aint bad| en-US| null|[{positive, null, {0.99, 0.01, 0.00}, [{The cogn...|
+-----------------------------------------+--------+-----+--------------------------------------------------+
Congratulations! You successfully authored and ran a .NET for SynapseML app. Refer to the developer docs for API guidance.
Next
- Refer to this tutorial for deploying a .NET app to Databricks.
- You could download compatible install-worker.sh and db-init.sh files needed for deployment on Databricks.